This is where the final project report write-up goes.
Before you submit, make sure everything runs as expected.
You can add sections as you see fit. Make sure you have a section called “Introduction” at the beginning and a section called “Conclusion” at the end. The rest is up to you!
##Introduction - Load the tidyverse, ggplot, and rtweet packages
library(tidyverse)
library(ggplot2)
library(rtweet)
library(readr)
This data set was scraped from WineEnthusiast, a website that reviews and rates many differet types of wines.
wines <- read.csv(file = '../data/winemag-data-130k-v2.csv')[,-1]
set.seed(19630217)
wine_sample<- sample_n(wines, 1000)
EDA (correlation priceXpoints, with DataExplorer library? using (this)[https://datascienceplus.com/blazing-fast-eda-in-r-with-dataexplorer/])
wines %>%
ggplot() +
geom_point(mapping = (aes(x = points, y = price)), na.rm = T)
wines %>%
summarize(mean(price, na.rm=TRUE),
min(price, na.rm=TRUE),
max(price,na.rm=TRUE),
sd(price, na.rm=TRUE))
wines %>%
summarize(mean(points, na.rm=TRUE),
min(points, na.rm=TRUE),
max(points,na.rm=TRUE),
sd(points, na.rm=TRUE))
Select the provinces based on points and Select the best province for wine based on the average points of the sample size.
#find the average number of points across the 1,000 samples
wine_per_province <- wine_sample %>%
select(province, points) %>%
summarise(points = mean(points))
wine_per_province
#Find the best province for wine using the average points across the 1,000 samples #drop the descriptions or just select price? set points to max(points)
best_province <- wine_sample %>%
group_by(province, points) %>%
filter(points > 88.669)
best_province